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Information Complexity and the Quest for Interactive Compression 

(Survey) 

Omri Weinstein* 


Abstract 

Information complexity is the interactive analogue of Shannon’s classical information theory. 
In recent years this field has emerged as a powerful tool for proving strong communication lower 
bounds, and for addressing some of the major open problems in communication complexity 
and circuit complexity. A notable achievement of information complexity is the breakthrough 
in understanding of the fundamental direct sum and direct product conjectures, which aim 
to quantify the power of parallel computation. This survey provides a brief introduction to 
information complexity, and overviews some of the recent progress on these conjectures and 
their tight relationship with the fascinating problem of compressing interactive protocols. 


1 Introduction 

The holy grail of complexity theory is proving lower bounds on different computational models, 
thereby delimiting computational problems according to the resources they require for solving. One 
of the most useful abstractions for proving such lower bounds is communication complexity, Since 
its introduction |Yao79] . this model has had a profound impact on nearly every field of theoretical 
computer science, including VLSI chip design, data structures, mechanism design, property testing 
and streaming algorithms [WacQOl IPWlDt IDNlll IBBM12) to mention a few, and constitutes one 
of the few known tools for proving unconditional lower bounds. As such, developing new tools in 
communication complexity is a promising approaches for making progress within computational 
complexity, and in particular, for proving strong circuit lower bounds that appear viable (such as 
Karchmer-Wigderson games and ACC lower bounds |KW88l IBT91] ). 

Of particular interest are “black box” techniques for proving lower bounds, also known as “hard¬ 
ness amplification” methods (which morally enable strong lower bounds on composite problems via 
lower bounds on a simpler primitive problem). Classical examples of such results are the Parallel 
Repetition theorem |Raz98L rRao08j and Yao’s XOR Lemma |Yao82j . both of which are cornerstones 
of complexity theory. This is the principal motivation for studying the direct sum and direct product 
conjectures, which are at the core of this survey. 

Perhaps the most notable tool for studying communication problems is information theory, 
introduced by Shannon in the late 1940s in the context of (one-way) data transmission problems 
[Sh a48) . Shannon’s noiseless coding theorem revealed the tight connection between communication 
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and information, namely, that the amortized description length of a random one-way message (M) 
is equivalent to the amount of information it contains 
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where M” denotes n i.i.d observations from M, C is the minimum number of bits of a string from 
which M” can be recovered (w.h.p), and H{-) is Shannon’s Entropy function. In the 65 years that 
elapsed since then, information theory has been widely applied and developed, and has become the 
primary mathematical tool for analyzing communication problems. 

Although classical information theory provides a complete understanding of the one-way trans¬ 
mission setup (where only one party communicates), it does not readily convert to the interactive 
setup, such as the (two-party) communication complexity model. In this model, two players (Alice 
and Bob) receive inputs x and y respectively, which are jointly distributed according to some prior 
distribution /i, and wish to compnte some fnnction f{x, y) while communicating as little as possible. 
To do so, they engage in a communication protocol, and are allowed to use both public and private 
randomness. A natnral extension of Shannon’s entropy to the interactive setting is the Information 
Complexity of a function IC^(/, e), which informally measures the average amount of information 
the players need to disclose each other about their inputs in order to solve / with some prescribed 
error under the inpnt distribution y. From this perspective, communication complexity can be 
viewed as the extension of transmission problems to general tasks performed by two parties over a 
noiseless channel (the noisy case recently received a lot of attention as well |Bral4j i. Interestingly, 
it turns out that an analogne of Shannon’s theorem does in fact hold for interactive computation, 
asserting that the amortized communication cost of computing many independent copies of any 
function / is precisely equal to its single-copy information complexity: 

Theorem 1.1 (“Information = Amortized Communication”, |BRllj l. For any e > 0 and any 
two-party communication function f{x,y), 


lim 




n 


IC4/,e). 


Here D^n(/ 


e) denotes the minimum communication required for solving n independent instances 
of / with error at most s on each copyQ The above theorem assigns an operational meaning to 
information complexity, namely, one which is grounded in reality (in fact, it was recently shown 


that this characterization is a “sharp threshold”, see Theorem 4.8). 


Theorem o and some of the additional results mentioned in this survey, provide a strong evi¬ 
dence that information theory is the “right” tool for studying interactive communication problems. 
One general benefits of information theory in addressing commnnication problems is that it pro¬ 
vides a set of simple yet powerful tools for reasoning about transmission problems and more broadly 
about quantifying relationships between interdependent random variables and conditional events. 
Tools that include mutual information, the chain rule, and the data processing inequality |CT91] . 
Another, arguably most important beneht, is the additivity of information complexity under com¬ 
position of independent tasks (Lemma |3 .1 1 below). This is the main reason that information theory, 
unlike other analytic or combinatorial methods, is apt to give exact bounds on rates and capacities 


^Indeed, the “<” direction of this proof gives a protocol with overall success only « (1 — e)" on all n copies, see 


|BR11| . 














(such as Shannon’s noiseless coding theorem and Theorem O)- It is this benefit that has been 
primarily used in prior works (which are beyond the scope of this survey) involving information- 
theoretic applications in communication complexity, circuit complexity, streaming, machine learn¬ 
ing and privacy ( [CSWYnil ITmnl ICKSn.Sl IRY.TKSndl UKROQ! IRGPW131 IZD.TW131 IWZ14) to men¬ 
tion a few). 

One caveat is that mathematically striking characterizations such as the noiseless coding the¬ 
orem only become possible in the limit, where the number of independent samples transmitted 
over the channel (i.e., the block-length) grows to infinity. One exception is Huffman’s “one-shot” 
compression scheme (aka Huffman coding, [Huf52| 1. which shows that the expected number of bits 
C{M) needed to transmit a single sample from M, is very close (but not equal!) to the optimal 
rate 


H{M)<C{M)<H{M) + l. (2) 

Huffman’s theorem of course implies Shannon’s theorem (since entropy is additive over independent 
variables), but is in fact much stronger, as it asserts that the optimal transmission rate can be 
(essentially) achieved using much a smaller block length. Indeed, what happens for small block 
lengths is of importance for both practical and theoretical reasons, and it will be even more so in the 
interactive regime. While Theorem |1.1| provides an interactive analogue of Shannon’s theorem, an 
intriguing question is whether an interactive analogue of Huffman’s “one-shot” compression scheme 
exists. When the number of communication rounds of a protocol is small (constant), compressing 
it can morallj0 be done by applying Huffman’s compression scheme to each round of the protocol, 
since ([^ would entail at most a constant overhead in communication. However, when the number 
of rounds is huge compared to the overall information revealed by the protocol (e.g., when each 
round reveals <C 1 bits of information), this approach is doomed to fail, as it would “spend” at 
least 1 bit of communication per round. Circumventing this major obstacle and the important 
implications of this (unsettled) question to the direct sum and product conjectures are extensively 
discussed in Sections |4] and [H 

Due to space constraints, this survey is primarily focused on the above relationship between 
information and communication complexity. As mentioned above, information complexity has 
recently found many more exciting applications in complexity theory - to interactive coding, 
streaming lower bounds, extension complexity and multiparty communication complexity (e.g., 
[BYJKSO^ IBM121 IBPI3i lBEQ+13| b Such applications are beyond the scope of this survey. 

Organization We begin with a brief introduction to information complexity and some of its 
main properties (Section]^. In Section]^ we give an overview of the direct sum and direct product 
conjectures and their relationship to interactive compression, in light of recent developments in 
the field. Section describes state-of-the-art interactive compression schemes. We conclude with 
several natural open problems in Section]^ In an effort to keep this survey as readable and self- 
contained as possible, we shall sometimes be loose on technical formulations, often ignoring constant 
and technical details which are not essential to the reader. 

^This is not accurate, since unlike the one-way transmission setting, in this setting the receiver has “side infor¬ 
mation” about the transmitted message, e.g., when Bob sends the second message of the protocol, Alice has a prior 
distribution on this message conditioned on her input X and the first message of the protocol Mi which she sent 
before. Nevertheless, using ideas from rejection sampling, such simulation is possible in the “one-shot” regime with 
0(1) communication overhead per message | H JMROTl IBRl l] . 















2 Model and Preliminaries 


The following background contains basic definitions and notations used throughout this survey. For 
a more detailed overview of communication and information complexity, we refer the reader to an 
excellent monoerraDh bv Braverman [Bra m- 

For a function f : X x y ^ Z, a distribution /r oveiX x T, and a parameter e > 0, D^(/, e) 
denotes the communication complexity of the cheapest deterministic protocol computing / on 
inputs sampled according to with error e. R(/, e) denotes the cost of the best randomized public 
coin protocol which computes / with error at most e, for all possible inputs (x, y) G X xy. When 
measuring the communication cost of a particular protocol vr, we sometimes use the notation ||7r|| for 
brevity. Essentially all results in this survey are proven in the former distributional communication 
model (since information complexity is meaningless without a prior distribution on inputs), but 
most lower bounds below can be extended to the randomized model via Yao’s minimax theorem. 
For the sake of concreteness, all of the results in this article are stated for (total) functions, though 
most of them apply to partial functions and relations as well. 

2.1 Information Theory 

Proofs of the claims below and a broader introduction to information theory can be found in |CT91j . 
The most basic concept in information theory is Shannon’s entropy, which informally captures how 
predictable a random variable is: 

Definition 2.1 (Entropy). The entropy of a random variable A is H{A) := ~ 

a] log(l/Pr[T = a]). The conditional entropy H{A\B) is defined as [H{A\B = b)]. 

A key measure in this article is the Mutual Information between two random variables, which 
quantifies the amount of correlation between them: 

Definition 2.2 (Mutual Information). The mutual information between two random variables A, B, 
denoted I{A; B) is defined to be the quantity H[A) — H{A\B) = H{B) — H{B\A). The conditional 
mutual information I{A]B\C) is H{A\C) — H{A\BC). 

A related distance measure between distributions is the Kullback-Leibler (KL) divergence 



We shall sometimes abuse the notation and write D(A|c||i?|c) to denote the KL divergence 
between the associated distributions of the random variables {A\C = c) and {B\C = c). The 
following connection between divergence and mutual information is well known: 


Lemma 2.3 (Mutual information in terms of Divergence). 


I{A; B\C) = Eb,, [D {A\bc\\A\c)] = E„,e [O (B|ac||B|c)] . 


Intuitively, the above equation asserts that, if the mutual information between A and B (con¬ 
ditioned on C) is large, then the distribution of {A\c) is “far” from {A\hc) for average values of b, c 
(this captures the fact that the “additional information” B provides on A given C is large). One 
of the most useful properties of Mutual Information and KL Divergence is the chain rule: 










Lemma 2.4 (Chain Rule). Let A, B, C, D he four random variables in the same probability space. 
Then 

I{AB; C\D) = I{A; C\D) + I{B; C\AD) 

= Ec,d [O {A\cd\\A\d)] + Ea,c,d {B\acd\\A\ad)] . 

Lemma 2.5 (Conditioning on independent variables does not decrease information). Let A, B, C, D 
be four random variables in the same probability space. If A and D are conditionally independent 
given C, then it holds that I{A; B\C) < I{A] B\CD). 

Proof. We apply the chain rule for mutual information twice. On one hand, we have I [A] BD\C) = 
I{A] B\C) + I{A] D\CB) > I{A; B\C) since mutual information is nonnegative. On the other hand, 
I{A-,BD\C) = I{A-,D\C) + I{A;B\CD) = I{A-, B\CD) since I{A-,D\C) = 0 by the independence 
assumption on A and D. Combining both equations completes the proof. □ 


Throughout this article, we denote by \p — q\ the total variation distance between the distribu¬ 
tions p and q. Pinsker’s inequality bounds statistical distance in terms of the KL divergence. It 
will be useful for analysis of the interactive compression schemes in Section 

Lemma 2.6 (Pinsker’s inequality). |p — gp < ^ • D {p\\q). 


2.2 Interactive Information complexity 

Given a communication protocol vr, 7r{x, y) denotes the concatenation of the public randomness 
with all the messages that are sent during the execution of vr (for information purposes, this is 
without loss of generality, since the public string R conveys no information about the inputs). 
We call this the transcript of the protocol. When referring to the random variable denoting the 
transcript, rather than a specific transcript, we will use the notation n(x, y) — or simply 11 when 
X and y are clear from the context. 

Definition 2.7 (Internal Information Cost ICSWYOTllBBCRlOj l. The (internal) information cost 
of a protocol over inputs drawn from a distribution p, on X x T, is given by: 

IC^(7r) :=/(n;X|y) + /(n;y|Y). (3) 

Intuitively, the definition in Q captures how much additional information the two parties learn 
about each other’s inputs by observing the protocol’s transcript. For example, the information cost 
of the trivial protocol in which Alice and Bob simply exchange their inputs, is simply the sum of 
their conditional marginal entropies H{X\Y)+H{Y\X) (notice that, in contrast, the communication 
cost of this protocol is |A| -|- |y| which can be arbitrarily larger than the former quantity). 

Another information measure which makes sense at certain contexts is the external information 
cost of a protocol, IC®’^''(7r) := /(II; XY), which captures what an external observer learns on average 
about both player’s inputs by observing the transcript of tt. This quantity will be of minor interest 
in this survey (though it playes a central role in many applications). The external information cost 
of a protocol is always at least as large as its (internal) information cost, since intuitively an external 
observer is “more ignorant” to begin with. We remark that when p is a product distribution, then 
= IC^(7r) (see, e.g., |Bral2]). 

One can now define the information complexity of a function / with respect to p and error e 
as the least amount of information the players need to reveal to each other in order to compute / 
with error at most e: 






Definition 2.8. The Information Complexity of / with respect to /r (and error e) is 


IC^(/,e) 


inf ICy(7r). 

tt: Pr^[TT{x,y)^f(x,y)]<e 


What is the relationship between the information and communication complexity of /? This 
question is at the core of this survey. The answer to one direction is easy: Since one bit of 
communication can never reveal more than one bit of information, the communication cost of any 
protocol is always an upper bound on its information cost over any distribution fi: 

Lemma 2.9 f |BRllj i. For any distribution pL, IC^(vr) < ||7r||. 

The answer to the other direction, namely, whether any protocol can be compressed to roughly 
its information cost, will be partially given in the remainder of this article. 


2.3 The role of private randomness in information complexity 

A subtle but vital issue when dealing with information complexity, is understanding the role of 
private vs. public randomness. In public-coin communication complexity, one often ignores the 
usage of private coins in a protocol, as they can always be simulated by public coins. When dealing 
with information complexity, the situation is somewhat the opposite: Public coins are essentially a 
redundant resource (as it can be easily shown via the chain rule that IC^(7r) = Eij[IC^(7rij)]), while 
the usage of private coins is crucial for minimizing the information cost, and fixing these coins is 
prohibitive (once again, for communication purposes in the distributional model, one may always fix 
the entire randomness of the protocol, via the averaging principle). To illustrate this point, consider 
the simple example where in the protocol vr, Alice sends Bob her 1-bit input X ~ Ber(1/2), XORed 
with some random bit Z. If Z is private, Alice’s message clearly reveals 0 bits of information to 
Bob about X. However, for any fixing of Z, this message would reveal an entire bit(!). The general 
intuition is that a protocol with low information cost would reveal information about the player’s 
inputs in a “careful manner”, and the usage of private coins serves to “conceal” parts of their inputs. 
Indeed, it was recently shown that the restriction to public coins may cause an exponential blowup 
in the information revealed compared to private-coin protocols ([GKRT4[ IBMY14] ). In fact, we 
shall see in Section that quantifying this gap between public-coin and private-coin information 
complexity is tightly related to the question of interactive compression. 

For the remainder of this article, communication protocols vr are therefore assumed to use 
private coins (and therefore such protocols are randomized even conditioned on the inputs x, y and 
R), and it is crucial that the information cost IC^(7r) = 1(11; X\YR) -|- I{Il;Y\XR) is measured 
conditioned on the public randomness R, but never on the private coins of vr. 

3 Additivity of Information Complexity 

Perhaps the single most remarkable property of information complexity is that it is a fully ad¬ 
ditive measure over composition of tasks. This property is what primarily makes information 
complexity a natural “relaxation” for addressing direct sum and product theorems. The main 
ingredient of the following lemma appeared first in the works of |Raz08[ IRaz98] and more explic¬ 
itly in [BBCRlOl IBB,11( IBral2] . In the following, /”■ denotes the function that maps the tuple 

((Xi, . . .,Xn), {yi, . . .,yn)) to (/(Xi,?/i), . . . , f {Xn, Vn))) ■ 


















Lemma 3.1 (Additivity of Information Complexity). IC^n(/"',e) = n • IC^(/, e). 

Proof. The (<) direction of the lemma is easy, and follows from a simple argument that applies 
the single-copy optimal protocol independently to each copy of /”, with independent randomness. 
We leave the simple analysis of this protocol as an exercise to the reader. 

The (>) direction is the main challenge. Will will prove it in a contra-positive fashion: Let 
n be an e-error protocol for /”, such that IC^n(n) = I (recall that here e denotes the per-copy 
error of 11 in computing f{xi,yi)). We shall use 11 to produce a single-copy protocol for / whose 
information cost is < I/n, which would complete the proof. The guiding intuition for this is that 
n should reveal I/n bits of information about an average coordinate. 

To formalize this intuition, let {x,y) ~ y, and denote X ;= Ai ... A„ , X<i := Xi...Xi and 
X-i := Xi... Aj_i, Aj_|_i,..., Xn, and similarly for Y, y<j, Y_j. A natural idea is for Alice and Bob 
to “embed” their respective inputs (x, y) to a (publicly chosen) random coordinate i G [n] of H, and 
execute 11. However, H is defined over n input copies, so in order to execute it, the players need to 
somehow “fill in” the rest (n — 1) coordinates, each according to y. How should this step be done? 
The first attempt is for Alice and Bob to try and complete A_j, privately. This approach fails 
if /i is a non-product distribution, since there’s no way the players can sample X and Y privately, 
such that (A, y) ~ /r if /i correlates the inputs. The other extreme - sampling X-i,Y-i using 
public randomness only - would resolve the aforementioned correctness issue, but might leak too 
much information: An instructive example to consider is where, in the first message of H, Alice 
sends Bob the XOR of the n bits of her uniform input X: M = Xi © X 2 © ... © Xn- Conditioned 
on X-i, Y-i, M reveals 1 bit of information about Xi to Bob, while we want to argue that in this 
case, only 1/n bits are revealed about Aj. So this approach reveals too much information. 

It turns out that the “right” way of breaking the dependence across the coordinates is to use a 
combination of public and private randomness. Let us define, for each i £ [n], the public random 
variable 

Ri ■= A<j, y>j. 

Note that given Ri, Alice can complete all her missing inputs A>j privately according to /r, and 
Bob can do the same for y<j. Let us denote by 9{x,y,i,Ri) the protocol transcript produced by 
running n(Ai,..., Ai_i, x, Aj+i,..., A„ , Fi,..., Tj-i, y, Ti+i,..., y„) and outputting its answer on 
the f’th coordinate. Let 0(x,y) be the protocol obtained by running 9{x,y,i, Ri) on a uniformly 
selected i £ [n]. 

By definition, H computes /"■ with a per-copy error of e, and thus in particular 0(x, y) = /(x, y) 
with probability > 1 — e. To analyze the information cost of 0, we write: 


I{Q-x\y)=¥.,^R^[I{9-x\y,Ri)] = y^--I{Ii-X, \ Yi,Ri) 


2 = 1 


- fL - ft 

Vl(n;W I Yi,X^iYyi) = - V/(n;Ai | X^iY>i) 

n ~ 


n 


2 = 1 


2=1 


n 

< - V /(H; A, I A<,Y) = - • /(H; X | Y), 
n n 


2 = 1 


where the inequality follows from Lemma 2.5, since 1 {Y^p, Aj| A<j) = 0 by construction, and the last 
transition is by the chain rule for mutual information. By symmetry of construction, an analogous 



argument shows that 7(0; y|x) < 7(11; Y | X)/n, and combining these facts gives 

IC^(0) < - (/(n;X I Y) + 7(n; Y | X)) = (4) 

n n 

□ 

4 Direct Sum, Product, and the Interactive Compression Problem 

Direct sum and direct product theorems assert a lower bound on the complexity of solving n copies 
of a problem in parallel, in terms of the cost of a single copy. Let /” denote the problem of 
computing n simultaneous instances of the function / (in some arbitrary computational model for 
now), and C{f) denote the cost of solving a single copy of /. The obvious solution to /” is to 
apply the single-copy optimal solution n times sequentially and independently to each coordinate, 
yielding a linear scaling of the resources, so clearly C(/”) < n • C{f). The strong direct sum 
conjecture postulates that this naive solution is essentially optimal. In the context of randomized 
communication complexity, the strong direct sum conjecture informally asks whether it is true that 
for any function / and input distribution //, 

D^"(r,e)='f^(n)-D4/,e). (5) 

More generally, direct sum theorems aim to give an (ideally linear in n, but possibly weaker) lower 
bound on the communication required for computing with some constant overall error e > 0 in 
terms of the cost of computing a single copy of / with the same (or comparable) fixed error. 

A direct product theorem further asserts that unless sufficient resources are provided, the prob¬ 
ability of successfully computing all n copies of / will be exponentially small, potentially as low as 
(1 — This is intuitively plausible, since the naive solution which applies the best (e-error) 

protocol for one copy of / independently to each of the n coordinates, would indeed succeed in 
solving /” with probability (1 — e)'^. Is this naive solution optimal? 

To make this more precise, let us denote by suc(^, /, C) the maximum success probability of a 
protocol with communication complexity < C in computing / under input distribution /r. A direct 
product theorem asserts that any protocol attempting to solve /” (under /r”) using some number T 
of communication bits (ideally T = D(n-C')), will succeed only with exponentially small probability: 
suc(/r", /", T) < (1 — Informally, the strong direct product question asks whether 

suc(/r-, r,o(n.C)) (6) 

Note that in particular implies 0 when setting C = D^(/, e). Classic examples of direct 
product results in complexity theory are Raz’s Parallel Repetition Theorem |Raz98t IRaoOSj and 
Yao’s XOR Lemma |Yao82j (For more examples and a broader overview of the rich history of 
direct sum and product theorems see |.IPY12| and references therein). The value of such results to 
computational complexity is clear: direct sum and product theorems, together with a lower bound 
on the (easier-to-reason-about) “primitive” problem, yield a lower bound on the composite problem 
in a “black-box” fashion (a method also known as hardness amplification). For example, the 
Karchmer-Raz-Wigderson approach for separating P from NC^ can be completed via a (still open) 
direct sum conjecture for Boolean formulas |KRW95] (after more than a decade, some progress on 
this conjecture was recently made using information-complexity machinery |GMWW14) L Other 














fields in which direct sums and products have played a central role in proving tight lower bounds 
are streaming [BY.TKSOdl IST13t IMWY131IG013] and distributed computing |HRVZ13] . 


Can we always hope for such strong lower bounds to hold? It turns out that the validity of these 
conjectures highly depends on the underlying computational model, and the short answer is noj^ 
In the communication complexity model, this question has had a long history and was answered 
positively for several restricted models of communication [KlalOl IShaOdl ILSSOSl IShel21 l,IPY12[ 
IMWY131IPRW97] ■ Interestingly, in the determistic communication complexity model, Feder et al. 
|FKNN9,^ showed that 

D(n>n-c(yD^) 

for any two-party Boolean function / (where D(/) stands for the deterministic communication 
complexity of /), but this proof completely breaks when protocols are allowed to err. Indeed, in the 
randomized communication model, there is a tight connection between the direct sum question for 
the function / and its information complexity. By now, this should come as no surprise: Theorem 
o asserts that, for large enough n, the communication complexity of /” scales linearly with the 
(single-copy) information cost of /, i.e. D^n(/'^,e) = 0 (n • IC^(/, e)), and hence the strong direct 
sum question ([^ boils down to understanding the relationship between the single-copy measures 
D^(/, e) and IC^(/, e). Indeed, it can be formally shown 1 [BRllj i that the direct sum problem is 
equivalent to the following problem of “one-shot” compression of interactive protocols: 

Problem 4.1 (Interactive compression problem, |BBCR10] i. Given a protocol vr over inputs x, y ~ 
fi, with ||7r|| = C, IC^(7r) = I, what is the smallest amount of communication of a protocol r which 
(approximately) simulates n (i.e., 3 g s.t \g{T{x,y)) — 7r(x,y)|i < 6 for a small constant 5)? 


In particular, if one could compress any protocol into 0{I) bits, this would have shown that 
D^(/, e) = O (IC^(/, e)) which would in turn imply the strong direct sum conjecture. In fact, the 
additivity of information cost (Lemma 3.1 from Section]^ implies the following general quantitative 
relationship between (possibly weaker) interactive compression results and direct sum theorems in 
communication complexity: 


Proposition 4.2 (One-Shot Compression implies Direct Sum). Suppose that for any (5 > 0 and any 
given protocol tt for which \C^{'k) = I , ||7r|| = C, there is a compression scheme that 5 -simulate^ 
vr using gs{I,C) bits of communication. Then 


95 


D^n{r,e) 


n 


,D^.(r,e) >D^{f,e + 5). 


^In the context of circuit complexity, for example, this conjecture fails (at least in its strongest form): Multiplying 
an n X n matrix by a (worst case) n-dimensional vector requires operations, while (deterministic) multiplication 
of n different vectors by the same matrix amounts to matrix-multiplication of two nx n matrices, which can be done 

in ^ n® operations |Will2] . _ 

^The exact equivalence of the direct sum conjecture and Problem 4.1 holds for relations (Theorem 6.6 in [BRllp . 


For total functions, one could argue that the requirement in Problem |4.1| is too harsh as it requires simulation of the 
entire transcript of the protocol, while in the direct sum context for functions we are merely interested in the output 
of /. However, all known compression protocols satisfy the stronger requirement and no separation is known between 
those techniques. 

®The simulation here is in an internal sense, namely, Alice and Bob should be able to reconstruct the transcript of 
the original protocol (up to a small error), based on public randomness and their own private inputs. See |BRWYf^ 
for the precise definition and the (subtle) role it plays in context of direct product theorems. 







































Proof. Let 11 be an optimal n-fold protocol for /” under with per-copy error e, i.e., ||n|| = 
:= Cn- By Lemma [3d] (equation Q), there is a single-copy e-error protocol 6 for com¬ 
puting /(x, y) under /r, whose information cost is at most IC^n(n)/n < Cn/n (since communication 
always upper bounds information). By assumption of the claim, 6 can now be (5-simulated using 
9 &{Cnln, Cn) communication, so as to produce a single-copy protocol with error < e + S for f, and 
therefore D^(/, e -|- 5) < gs{Cnln , Cn). □ 


The hrst general interactive compression result was proposed in the seminal work of Barak, 
Braverman, Chen and Rao [BBCRIO] , who showed that any protocol vr ca n be (5-simulated us¬ 
ing gs{I,C) = Os{VC ■ I) communication (we prove this result in Section 5.1). Plugging this 
compression result into Proposition 4.2 this yields the following weaker direct sum theorem: 


Theorem 4.3 (Weak Direct Sum, |BBCR10] L For every Boolean function f, distribution g, and 
any positive constant 5 > 0, 


D^,.ir,e)>n{V^-D^{f,e + 5)). 


Later, Braverman |Bral2j showed that it is always possible to simulate vr using bits of 

communication. This result is still far from ideal compression (0(1) bits), but it is nevertheless 
appealing as it show that any protocol can be simulated using amount of communication which 
depends solely on its information cost, but independent of its original communication which may 


have been arbitrarily larger (we prove this result in Section 5.2). Notice that the last two compres¬ 
sion results are indeed incomparable, since the communication of vr could be much larger than its 
information complexity (e.g., C > 2^ ). The current state of the art for the general interactive 
compression problem can be therefore summarized as follows: Any protocol with communication 
C and information cost I can be compressed to 


gs{I,C) < min{20^(^) , OsiVTC)} 


(7) 


bits of communication. 


The above results may seem as a plausible evidence that it is in fact possible to compress general 
interactive protocols all the way down to 0{I) bits. Unfortunately, this task turns out to be too 
ambitious: In a recent breakthrough result, Ganor, Kol and Raz |GKR14j proved the following 
lower bound on the communication of any compression scheme: 

55 (/,C) >max{2^(^) , D(/-logG)}. (8) 

More specifically, they exhibit a Boolean function / which can be solved using a protocol with 
information cost /, but cannot be simulated by a protocol vr' with communication cost < 2^^^^ 
(a simplified construction and proof was very recently obtained by Rao and Sinha |RS15] L Since 
the communication of the low information protocol they exhibit is ~ 2^ , this also rules out a 
compression to I ■ o(logC'), or else such compression would have produced a too good to be true 
(2°^^) communication) protocol. The margin of this text is too narrow to contain the proof of 
this separation result, but it is noteworthy that proving it was particularly challenging: It was 
shown that essentially all previously known techniques for proving communication lower bounds 
apply to information complexity as well |BW12[ |KLL'*~12] , and hence could not be used to separate 



















information complexity and communication complexity. Using (the reverse direction of) Proposition 
4.2 (see Theorem 6.6 in |BR11] ). the compression lower bound in Q refutes the strongest possible 
direct sum ([^, but leaves open the following gap 


^5 {y/n) < min 


D/.-(r,£) 

D;.(/,e + 5) 


< O 


n 


logn 


(9) 


Notice that this still leaves the direct sum conjecture for randomized communication complexity 
wide open: It is still conceivable that improved compression to gs{I, C) = I is in fact possible, 
and the quest to beat the compression scheme of |BBCRin] remains unsettled]^ 

Despite the lack of progress in the general regime, several works showed that it is in fact possible 
to obtain near-optimal compression results in restricted models of communication: When the input 
distribution /r is a product distribution {x and y are independent), |BBCR10| show a near-optimal 
compression result, namely that vr can be compressed into 0{I ■ poly log (C)) bitsj^ Once again, 
using Proposition |4.2| this yields the following direct sum theorem: 

Theorem 4.4 f [BBCRIO] ). For every product distribution g and any <5 > 0, 


D^n{r,e) = n{n-D^if,e + 6)). 


Improved compression results were also proven for public-coin protocols (under arbitrary distri¬ 
butions) |BBK~*~13[IBMYl4| . and for bounded-round protocols, leading to near-optimal direct sum 
theorems in corresponding communication models. We summarize these results in Table 


Reference 

Regime 

Communication Complexity 

IH.IMRn?! 

[RRmiRRWY13] 

r-round protocols, 
product distribution^ 

r-round protocols 

Public coin protocols 
Product distribution^ 

General protocols 
Best lower bound 

l + 0{r) 

I -b O ^\/r ' + 0(r log 1/5) 

0(/2.1oglog(C)/52) 

0(7 • poly log(C')/5) 

, 0{Vr^ ■log{C)/6)} 
max{2^(^) , D(/ • log(C')) 

|RMY14j (improved [RRK'''13j) 
|RRURin| 

|Bral2l IBBURlOj 

|(IKR141IR,S15| 


Table 1: Best to date compression schemes, for various regimes. Notice that in the general regime 
(last two columns), in terms of dependence on the original communication C, the gap is still very 
large (D(logC') vs. 0{C^/‘^)). 


4.1 Harder, better, stronger: From direct sum to direct product 


As noted above, direct sum theorems such as Theorems 1.1 4.3 and |4.4| are weak in that they 
merely assert that attempting to solve n independent copies of / using less than some number T of 
resources, would fail with some constant overall probability ((suc(/i”, /"■, o{y/n ■ C)) < e in the gen¬ 
eral case, and suc(//”, /”, o(n • C)) < e in the product case, where C = D^(/, e)). This is somewhat 


®Ramamoorthy and Rao | RR15| recently showed that BBCR’s compression scheme can be improved when the 
underlying communication protocol is asymmetric, i,e., when Alice reveals much more information than Bob. 

^ These compression results in fact hold for general (non-product) distributions as well, when compression is with 
respect to the external information cost of the original protocol tt (which may be significantly larger than I). 














































unsatisfactory, since the naive solution that applies the single-copy optimal protocol independently 
to each copy has only exponentially small success probability in solving all copies correctly. Indeed, 
some of the most important applications of hardness amplification require amplifying the error 
parameter (e.g., the usage of parallel repetition in the context of the PCP theorem). 

As mentioned before, many direct product theorems were proven in limited communication 
models (e.g. Shaltiel’s Discrepancy bound |Sha03[ ILSS08] which was extended to the generalized 
discrepancy bound [SheT^ . Parnafes, Raz, and Wigderson’s theorem for communication forests 
|PRW97| . Jain’s theorem [J ai m for simultaneous communication and |JY12j ’s direct product in 
terms of the “smooth rectangle bound” to mention a few), but none of them applied to general 
functions and communication protocols. In a recent breakthrough work, Jain, Pereszlenyi and Yao 
used an information-complexity based approach to prove a strong direct product theorem for any 
function (relation) in the bounded-round communication model. 

Theorem 4.5 1 |JPY12] 1. Let suCrifi, f,C) denote the largest success probability of an r-round 
protocol with communication at most C, and suppose that sucr(/r, /, C) < |. IfT = o[[^—r^-n), 
then suCr{pL, f^,T) < exp(—D(n/r^)). 

This theorem can be essentially viewed as a sharpening of the direct sum theorem of Braverman 
and Rao for bounded-round communication |BRllj . This bound was later improved by Braverman 
et. al who showed that suc,,/ 7 (^”,/”■, o((C — rlogr) • n)) < exp(—D(n)), thus settling the strong 
direct product conjecture in the bounded round regime. The followup work of |BRWY V2\ took 
this approach one step further, obtaining the first direct product theorem for unbounded-round 
randomized communication complexity, thus sharpening the direct sum results of [BBCRIO] . 

Theorem 4.6 ( [BRWYT^ . informally stated). For any two-party function f and distribution pL 
such that 5uc{pL, f,C) < the following holds: 

• //Tlog^/^T = o{C ■ y/n), then suc(/r”,/"■, T) < exp(—D(n)). 

• If p is a product distribution, and Tlog^T = o{C ■ n), then suc(^"',/”, T) < exp(—D(n)). 

One appealing corollary of the second proposition is that, under the uniform distribution, 
two-party interactive computation cannot be “parallelized”, in the sense that the best protocol 
for solving /"■ (up to polylogarithmic factors), is to apply the single-coordinate optimal protocol 
independently to each copy, which almost matches the above parameters. 


The high-level intuition behind the proofs of Theorems 4.5 and 4.6 follows the direct sum 
approach of [BBCRIO] (Proposition 4.2 above): Suppose, towards contradiction, that the success 
probability of an re-fold protocol using T bits of communication in computing under /r” is 
larger than, say, exp(—re/100). We would like to “embed” a single-copy {x,y) ~ p into this 
re-fold protocol, thereby producing a low information protocol (< T/n bits), and then use known 
compression schemes to compress this protocol, eventually obtaining a protocol with communication 
(< C), and a too-good-to-be-true success probability (> 2/3), contradicting the assumption that 
suc{p, f,C) < |. The main problem with employing the [BBCRIO] approach and embedding a 
single-copy {x,y) into vr using the sampling argument in Lemma 3.1, is that it would produce a 
single-copy protocol 0{x,y) whose success probability is no better than that of vr (exp(—re/100)) 
while we need to produce a single-copy protocol with success > 2/3 in order to achieve the above 
contradiction. 
































Circumventing this major obstacle is inspired by the idea of repeated conditioning which first 
appeared the parallel repetition theorem |Raz98j : Let W be the event that vr correctly computes 
/”, and Wj denote the event that the protocol correctly computes the i’th copy f{xi,yi). Let 
7r(W) denote the probability of W, and 7r(>Vi|W) denote the conditional probability of the event 
yVi given W (clearly, 7r(>Vj|>V) = 1). The idea is to show that if 7r(>V) > exp(—n/100) and 
||7r|| <C T (for the appropriate choice of T which is determined by the best compression scheme), 
then (1/n) vrCWil W) < 1, which is a contradiction. In other words, if one could simulate 

the message distribution of the conditional distribution (vrlW)* (rather than the distribution of 
7r(xj,yj)) using a low information protocol, then (via compression) one would obtain a protocol 
9{xi,yi) with constant success probability, as desired. 

The guiding intuition for why this approach makes sense, is that conditioning a random variable 
on a “large” event W does not change its original distribution too much: 


(XiTi, X 2 F 2 , ..., XnYn\W\\XiYi,X2Y2 ,..., XnYn) = O (XY|W||XY) 


=E 


log 


7r(XY|W) 

vr(XY) 


< E 


log 


vr(XY) 


7r(XY)7r(W) 


1 


log(7r(>V)) 


< 


n 

Too 


since 7r(W’) > exp(—n/100), which means (by the chain rule and independence of the n copies) 
that the distribution of an average input pair (Xj,Yj) conditioned on W is (l/100)-close to its 
original distribution y, and thus implies that at least the inputs to the “protocol” (vr|W)j can be 
approximately sampled correctly (using correlated sampling [HolOT] 1. The heart of the problem, 
however, is that ('7r|VV’)i is no longer a communication protocol. To see why, consider the simple 
protocol vr in which Alice simply “guesses” Bob’s bit x, and W being the event that her guess 
is correct. Then simulating (7r|W) requires Alice to know Bob’s input y, which Alice doesn’t 
have! This example shows that it is impossible to simulate the message distribution of (7r|W)i 


exactly. The main contribution of Theorem 4.6 (and Theorem 4.5 in the bounded-round regime) is 


showing that it is nevertheless possible to approximate this conditional distribution using an actual 
communication protocol, which is statistically close to a low-information protocol: 


Lemma 4.7 (Claims 26 and 27 from [BRWY . informally stated). There is a protocol 6 taking 
inputs x,y ^ fj, so that the following holds: 

• 9 publicly chooses a uniform i G [n] independent of x, y, and Ri which is part of the input to 


vr (intuitively, Ri determines the “missing” inputs X-i,y-i of tt as in Lemma 3.1). 


Ej [|(0|i?j) — {'K\RiW)i\] < 1/10 (that is, 9 is close to the distribution (7r|>V)i for average i). 

Ej [l.„.|yy(Aj; nlYjTZj) -|-/jr|w(Ti; n|Aji?j)] < 4||7r||/n (that is, the information cost of the dis¬ 
tribution (vrlW)* is low). 


The main challenge in proving this theorem is in the choice of the public random variable 
Ri, which enables relating the information of the protocol 9 to that of (vrlW) even under the 


(for details see |BRWY 1. Note that the last proposition of Lemma [4 . 7| only guarantees that the 
information cost of the transcript under the distribution (vrlW) is low (on an average coordinate), 
while we need this property to hold for the simulating protocol 9, in order to apply the compression 
schemes of [BBCRIO] which would finish the proof. Unfortunately, a protocol vr that is statistically 
close to a low-information distribution needs not be a low-information protocol itself: Consider, for 


conditioning on W. This technically-involved argument is a “conditional” analogue of Lemma 3.1 























example, a protocol vr where with probability 6 Alice sends her input X G {0,1}"" to Bob, and with 
probability 1 — J she sends a random string. Then vr is <5-close to a 0-information protocol, but has 
information complexity of ~ (5 • n, which could be arbitrarily high. |BRWY circumvented this 
problem by showing that the necessary compression schemes of [BBCR IO] are “smooth” in the sense 
that they also work for protocols that are merely close to having low-information. In a followup 
work, Braverman and Weinstein exhibited a general technique for converting protocol which are 
statistically-close to having low information into actual low-information protocols (see Theorem 
3 in |BW14j l. which combining Lemma 4.7 also led to a strong direct product theorem in terms 
of information complexity, sharpening the “Information=Amortized Communication” Theorem of 
Braverman and Rao: 


Theorem 4.8 ( [BW14| . informally stated). Suppose that IC^(/, 2/3) = I, i.e., solving a single 
eopy of f with probability 2/3 under p, requires I bits of information. //Tlog(r) = o{n ■ I), then 
suc(/r", /”, T) < exp {—Q{n)). 

In fact, this theorem shows that the direct sum and product conjectures in randomized commu¬ 
nication complexity are equivalent (up to polylogarithmic factors), and they are both equivalent to 


one-shot interactive compression, in the quantitative sense of Proposition 4.2 (we refer the reader 
to |BW14j for the formal details). 


5 State of the Art Interactive Compression Schemes 

In this section we present the two state-of-the-art compression schemes for unbounded-round 
communication protocols, the first due to Barak et ah, and the second due to Braverman 
[BBCRim lB ra I2]. As mentioned in the introduction, a natural idea for compressing a multi-round 
protocol is to try and compress each round separately, using ideas from the transmission (one-way) 
setup [Huf52( [tUMROTl IBR11| . Such compression suffers from one fatal flaw: It would inevitably 
require sending at least 1 bit of communication at each round, while the information revealed in 
each round may be <C 1 (an instructive example is the protocol in which Alice sends Bob, at each 
round of the protocol, an independent coin flip which is e-biased towards her input X ~ Ber{l/2), 
for e <C 1). Thus any attempt to implement the compression on a round- by-round basis is hopeful 
only when the number of rounds is bounded but is doomed to fail in general (indeed, this is the 
essence of the bounded-round compression schemes of |BRlll IBRWY 13] ). 

The main feature of the compression results we present below is that they do not depend 
on the number of rounds of the underlying protocol, but only on the overall communication and 
information cost. 

5.1 Barak et al.’s compression scheme 

Theorem 5.1 f |BBCR10] l. Let TT be a protoeol executed over inputs x,y ^ p, and suppose IC^(7r) = 
I and ||7r|| = C. Then for every e > 0, there is a protoeol r which e-simulates tt, where 

||r|| = O (^/^ • (log(C/e)/e)) . (10) 

Proof. The conceptual idea underlying this compression result is using public randomness to avoid 
eommunieation by trying to guess what the other player is about to say. Informally speaking, the 





























players will use shared randomness to sample (correlated) full paths of the protocol tree, according 
to their private knowledge: Alice has the “correct” distribution on nodes that she owns in the tree 
(since conditioned on reaching these nodes, the next messages only depend on her input x), and 
will use her “best guess” (i.e., her prior distribution on Bob’s next message, under /r, her input x 
and the history of messages) to sample messages at nodes owned by Bob. Bob will do the same on 
nodes owned by Alice. This “guessing” is done in a correlated way using public randomness (and 
no communication whatsoever (!)), in a way that guarantees that if the player’s guesses are close 
to the correct distribution, then the probability that they sample the same bit is large. 

The above step gives rise to two paths. Pa and Pb respectively. In the the next step, the players 
will use (mild) communication to find all inconsistencies among Pa and Pb and correct them one 
by one (according to the “correct” speaker). By the end of this process, the players obtain a 
consistent path which has the correct distribution n(x, y). Therefore, the overall communication of 
the simulating protocol would be comparable to the number of mistakes between Pa and Pb (times 
the communication cost of fixing each mistake). Intuitively, the fact that tt has low information will 
imply that the number of inconsistencies is small, as inconsistent samples on a given node typically 
occur when the “receiver’s” prior distribution is far from the “speaker’s” correct distributions, 
which will in turn imply that this bit conveyed a lot of information to the receiver (Alas, we will see 
that if the information revealed by the f’th bit of tt is e, then the probability of making a mistake 
on the z’th node is ss ^/e, and this is the source of sub-optimality of the above result. We discuss 
this bottleneck at the end of the proof). 

We now sketch the proof more formally (yet still leaving out some minor technicalities). Let 
n = Ml,... ,Mc denote the transcript of tt. Each node w at depth i of the protocol tree of vr is 
associated with two numbers, px,w and Py^w, describing the probability (according to each player’s 
respective “belief’) that conditioned on reaching w, the next bit sent in vr is “ 1 ” (the right child of 
w). That is. 


Px,w ■= Pr[Mj = 1 I xrM^i = w] , and Py^^ := Pr[Mj = 1 \ yr, M<i = w]. (11) 

Note that if w is owned by the Alice, then px,w is exactly the correct probability with which the 
z-th bit is transmitted in tt, conditioned that vr has reached w. 

In the simulating protocol r, the players first sample, without communication and using public 
randomness, a uniformly random number in the interval [ 0 , 1 ], for every node w of the protocol 
tre^ For simplicity of analysis, in the rest of the proof we assume the public randomness is fixed 
to the vale R = r. Alice and Bob now privately construct the following respective trees Ta,Tb- 
For each node w, Alice includes the right child of rc in Ta iff Pw,x < Pw, and the left child (“0”) 
otherwise. Bob does the same by including the right child of u; in Tb iff Pw,y < Pw 

The trees Ta and Tb define a unique path i = mi,..., me of vr, by combining outgoing edges 
from Ta in nodes owned by Alice, and edges from Tb in nodes owned by Bob. Note that i 
has precisely the desired distribution of n(A, F). To identify t, the players will now find the 
inconsistencies among Ta and Tb and correct them one by one. 

We say that a mistake occurs in level i if the outgoing edges of mi-i in Ta and Tb are inconsis¬ 
tent. Finding the (first) mistake of r amounts to finding the first differing index among two C-bit 
strings (corresponding to the paths Pa and Pb induced by Ta and Tb). Luckily, there is a random¬ 
ized protocol which accomplishes this task with high probability (1 — 7 ) using only 0{log{C/'y)) 

®Note that there are exponentially many nodes, but the communication model does not charge for local compu¬ 
tations or the amount of shared randomness, so these resources are indeed “for free”. 



bits of communication, using a clever “noisy” binary search due to Feige et al. |FPRU94] . Since 
errors accumulate over C rounds and we are aiming for an overall simulation error of e, we will 
set 7 s/C, thus the cost of fixing each inconsistency remains 0(log(C'/e)) bits. The expected 
communication complexity of r (over X, Y, R) is therefore 

E[||r||] = E[# mistakes of r] • 0(log(C'/e)). (12) 


Though we are not quite done, one should appreciate the simplicity of analysis of the cost of 
this protocol. The next lemma completes the proof, asserting that the expected number of mistakes 
T makes is not too large: 


Lemma 5.2. E[^ mistakes o/r] < y/C ■ I. 


Indeed, substituting the assertion of Lemma |5.2| into (12), we conclude that the expected com¬ 
munication complexity of r is 0 (\/C ■ I ■ poly log(C'/e)), and a standard Markov bound yields the 


bound in (10) and therefore finishes the proof of Theorem 5.1 


Proof of Lemma 5.2. Let Si be the indicator random variable denoting whether a mistake has 
occurred in step i of the protocol tree of tt. Hence the expected number of mistakes is 
We shall bound each term E[ifj] separately. By construction, a mistake at node w in level i occurs 
exactly when either px^w < Pw < Py,w or < Pw < Px,w Since pw was uniform in [0,1], the 
probability of a mistake is 


\px,w -Py,w\ = \{Mi\x,r,Mci = w)- {Mi\y,r,M<^i = t(;)|, 

where the last transition is by definition of px^w and Py^w Note that, by definition of a protocol, 
if w := m<j is owned by Alice, then Mi\xyrm<^i\ = Mi\xyrm<^i and if it is owned by Bob, then 
Mi\y, r, m<^i = M/x, y, r, m^i. We therefore have 


E[f*] = [\{Mi\xrm<i) - {Mi\yrm^i)\] 

max{\{Mi\xyrm<i) - (Mj|xrm<i)| , \{Mi\xyrm<:i) - {Mi\yrm<i)\}] 

i/D (MjI xyrm^i11 Mj| xrm<i)+0{Mi\xyrm^i11 Mj|yrm<i 


< K 

^xym<^i'^7T 


< K 


< y Ea;ym<i~7r (Mj|xyrm<j||M*|xrm<i) -h O (Mi|xyrm<i||Mj|yrm<j)] 

= \//(Mi; X\M^iRY) + /(M^; Y\M^iRX) 

where transition ( |13[ ) follows from Pinsker’s inequality (Lemma 
the convexity of and the last transition is by Proposition 2.3 


(13) 

(14) 

(15) 


2.6), transition (14) follows from 


Finally, by linearity of expectation and the Cauchy-Schwartz inequality, we conclude that 
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< 


.i=l 
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< ^/l{Mi-,X\M^iRY) + I {Mi; Y\M<iRX) 

i=l 
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I{Mi-,X\M<iRY) + I {Mi- Y\M^iRX) 


V2=l 


= \/^ 


where the last transition is by the chain rule for mutual information. 


□ 
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A natural question arising from the above compression scheme is whether the analysis in Lemma 
5.2 is tight. Unfortunately, the answer is yes, as demonstrated by the following example: Suppose 
Alice has a single uniform bit X ~ Ber{\l2), and consider the C-bit protocol in which Alice sends, 
at each round i, an independent sample Mi such that 


^ f Ber{l/2 + e) if x = 1 

* \ Ber{l/2 — e) if x = 0 

for e = Ijy/C. Since Bob has a perfectly uniform prior on A, a direct calculation shows that in 
this case /(Mi;A|M<j) < = D (i?er(l/2 + e)||i?er(l/2)) = O(e^), while the probability 

of making a “mistake” at step i of the simulation above is the total variation distance |i?er(l/2 + 
e) — Ber{l/2)\ ~ e. Therefore, the expected number of mistakes conditioned on, say, x = 1, is 
C ■ £ = \fC^ by choice of e = Ijy/C. I.e., this example shows that both Pinsker’s and the Cauchy- 
Schwartz inequalities are tight in the extreme case where each of the C bit of tt reveals ~ I/C bits 
of information. In the next section we present a different compression scheme which can do better 
in this regime, at least when I is much smaller than C. 


5.2 Braverman’s compression scheme 

Theorem 5.3 ([B ra m). Let TT be a protocol executed over inputs x,y ^ p., and suppose IC^( 7 r) = I. 
Then for every e > 0, there is a protocol r which e-simulates tt, where ||r|| = . 

Proof. To understand this result, it will be useful to view the interactive compression problem as 
the following correlated sampling task: Denote by iTxy the distribution of the transcript n(x, y), 
and by tTx (resp. TTy) the conditional marginal distribution n|x (n|y) of the transcript from Alice’s 
(Bob’s) point of view (for notational ease, the conditioning on the public randomness r of the 
protocol is included here implicitly. Note that in general vr is still randomized even conditioned on 
X, y, since it may have private randomness). By the product structure of communication protocols, 
the probability of reaching a leaf (path) t'G{ 0 ,l}^of 7 ris 

TTxy{t) = Px{i) ■ Py{tj (16) 

where Px{t) = Wwciiw is the product of the transition probabilities defined in ( [IT| ) on the 

nodes owned by Alice along the path from the root to £, and TTy{t} is analogously defined on the even 
nodes. Thus, the desirable distribution from which the players wish to jointly sample, decomposes 
to a natural product distribution]^ Similarly, 

= Px{() ■ Qxi^) and tTx{£) = qy{£) ■ Py{i) (17) 

where yx(f’) = Ylwciw evenP^,^ is Alice’s prior “belief’ on the even nodes owned by Bob along the 
path to ^ (see pTj)), and qy{L) = Y\w(ziw oddP^,w is Bob’s prior belief on the odd nodes owned by 
Alice. Thus, the player’s goal is to sample i ~ TTx,y) where Alice has the correct distribution on 
odd nodes (and only an estimate on the odd ones), and Bob has the correct distribution on even 

®As we shall see, the rejection sampling approach of the compression protocol below crucially exploits this product 
structure of the target distribution, and it is curious to note this simplifying feature of interactive compression as 
opposed to general correlated sampling tasks. 






nodes (and an estimate on the even ones). 


We claim that the information cost of vr being low (/) implies that Alice’s prior “belief’ Qx 
on the even nodes owned by Bob, is “close” to the true distribution py on these nodes (and vice 
versa for qy and px on the odd nodes). To see this, recall the equivalent interpretation of mutual 
information in terms of KL-divergence: 


I = /(IT; A|y) + I{U-,Y\X) = ^(x,y)r^^, i'^xyWT^y) + ® (tT^j/|| vra;)] 


= E, 




. ^ %(^) ^ 


= E 


X,y/r^TTx,y 


, Px{^) , 1 Pyi^) 

QyW Qx{(-) 


(18) 


where the last transition follows from substituting the terms according to ( |16[ ) and ©• The above 
equation asserts that the typical log-ratio Px/qy is at most /, and the same holds for Py/qx- The 
following simple corollary essentially follows from Markov’s inequalit}!^ so we state it without a 
proof. 

Corollary 5.4. Define the set of transcripts := {£ : Px{£) > • qy{£) or Py{£) > 

2 (-f+B/£ . }. Then 'nx,y{Be) < e. 


The intuitive operational interpretation of the above claim is that, for almost all transcripts £, 
the following holds: If a uniformly random point G [0,1] falls helow py{t}, then the probability it falls 
below qx as well is > 2~fi This intuition gives rise to the following rejection sampling approach: The 
players interpret the public random tape as a sequence of points (£j,aj,/3j), uniformly distributed 
inlA X [0,1] X [0,1], where lA = {0,1}*^ is the set of all possible transcripts of vr. Their goal will be 
to discover the first index i* such that ccj* < Px{£i*) and fii* < Py{£i*). Note that, by design, the 
probability that a random point satisfies these conditions is precisely Px{£i) ■ Pyi^i) = T^xyi^i), 
and therefore £** has the correct distribution. 

The players consider only the first t := 2\IA\ ln(l/e) points of the public tape, as the probability 
that a single node satisfies the desirable condition is exactly 1 /|^|, and thus by independence of 
the points, the probability that i* > t is at most (1 — 1/\U\Y = < e/16). 

To do so, each player defines his own set of “potential candidates” for the index i*. Alice defines 
the set 

A:= {i <T : ai < Px{£i) and fii < 2^^^^ ■ qx{£i)}- 

Thus A is the set of transcript which have the correct distribution on the odd nodes (which Alice 
can verify by herself), and “approximately” satisfies the desirable condition on the even nodes, on 
which Alice only has a prior estimate {qx)- Similarly, Bob defines 


B := {i <t : fii < Py{£i) and a* < • qy{li)]. 


By Corollary 5.4 Pr[.^* ^ Ar\B] < e/ 8 , so for the rest of the proof we assume that t* ^ Ar\B. In 
fact, I* is the hrst element oi Ar\B. Note that for each point (£j, a*, /?,), Pr[£i £ A^^B] < 2^^^'^/\U\. 
Since we consider only the first t = 2\U\ ln(l/e) points, this implies E[|M|] < 2 ®^/*" • 21n(l/e), and 
Chernoff bound further asserts that 


Pr[|M| > 2 ^°^/"] < e/16. 

^°One needs to be slightly careful, since the log ratios can in fact be negative, while Markov’s inequality applies 
only to non-negative random variables. However, it is well known that the contribution of the negative snmmands is 
bonnded, see |Bral2 | for a complete proof. 














Thus, if we let £i denote the event that i* ^ and £2 '■= {i* > t oi |^| > oi\B\ > }, 

then by a union bound Pr[Ti UT 2 ] < 2e/8 + 3e/16 < e/2. Thus, letting Tx,y denote the distribution 
of £j*|-'(Ti UT 2 )) the above implies 

\'^x,y ^ ^/ 2 , 

as desired. We will now show a (2-round) protocol r in which Alice and Bob output a leaf i ~ Tx,y, 
thereby completing the proof. To this end, note we have reduced the simulation task to the problem 
of finding and outputting the first element in Af\B^ where |A| < and \B\ < The idea 

is simple: Alice wishes to send her entire set A to Bob, who can then check for intersection with his 
set B. Alas, explicitly sending each element A may be too expensive (requires log|W| bits), so 
instead Alice will send Bob sufficiently many {0{Ije)) random hashes of the elements in A, using 
a publicly chosen sequence of hash functions. Since for a G A and b £ B such that a A b, the 
probability (over the choice of the hash functions) that hj{a) = hj{b) for all j G 0{I/s) is bounded 
by < , a union bound ensures that the probability there is an a G A, 6 G such 

that a A b but the hashes happen to match, is bounded by e/4, which completes the proof. For 
completeness, the protocol r is described in Figure [TJ 

The simulation protocol r 

1 . Alice computes the set A. If |A| > 2A'^^I^ the protocol fails. 

2 . Bob computes the set B. If \B\ > 2^^^!^ the protocol fails. 

3. For each a G A, Alice computes d = [20//e -|- log 1/e -|- 2 ] random hash values 
hi (a),..., h(i{a), where the hash functions are evaluated using public randomness. 

4. Alice sends the values {hj{ai)}ai£A, i<j<d to Bob. 

5. Bob finds the first index i such that there is a 6 G .S for which hj{b) = hj{ai) for j = l..d 
(if such an i exists). Bob outputs ib and sends the index i to Alice. 

6 . Alice outputs ii. 


Figure 1: A simulating protocol for sampling a transcript of 7r{x,y) using communication. 

□ 


6 Concluding Remarks and Open Problems 

We have seen that direct sum and product theorems in communication complexity are essentially 
equivalent to determining the best possible interactive compression scheme. Despite the exciting 
progress described in this survey, this question is still far from settled, and the natural open problem 
is closing the gap in (§. The current frontier is trying to improve the dependence on C over the 
scheme of jBBCRlO] . even at a possible expense of increased dependence on the information cost: 







Open Problem 6.1 (Improving compression for internal information). Given a protocol vr over 
inputs x,y ^ pL, with ||7r|| = C, IC^(7r) = I, is there a communication protocol r which (0.01)- 
simulates vr such that ||r|| < poly(/) • for some absolute positive constant 0 < e < 1/2? 


In fact, by a recent result of Braverman and Weinstein |BW14] . even a much weaker compression 
scheme in terms of I, namely g{I, C) < would already improve over the the state of 

the art compression scheme (0 (\/C ■ I)) and would imply new direct sum and product theorems. 

Another interesting direction which was unexplored in this survey, is closing the (much smaller) 
gap in (4.4), i.e, determining whether a logarithmic dependence on C is essential for interactive 
compression with respect to the external information cost measure. 


Open Problem 6.2 (Closing the gap for external compression). Given a protocol vr over inputs 
x,y ^ fi, with ||7r|| = C, IC^^''(7r) = I, is there a communication protocol r which 5-simulates tt such 
that ||r|| < poly(/) • o(log(C))? 

It is believed that the (logC) factor is in fact necessary (see e.g., that candidate separation 
sampling problem suggested in |Bral3j i. but this conjecture remains to be proved. 


Recall that in Section 4.1 we saw direct product theorems for randomized communication com¬ 
plexity, asserting a lower bound on the success rate of computing n independent copies of / in terms 
of the success of a single copy. When n is very large, such theorems can be superseded by trivial 
arguments, since f^ must require at least n bits of communication just to describe the output. 
One could hope to achieve hardness amplification without blowing up the output size - a classical 
example is Yao’s XOR lemma in circuit complexity. In light of the state-of-the-art direct product 
result, we state the following conjecture: 


Open Problem 6.3 (A XOR Lemma for communication complexity). Is it true that for any 
2-party function f and any distribution p, on X x y, 

D^n(/®-, 1/2 + = Q(Vn) • D^(/, 2/3)? 

(here /®”((a:i, yi),..., (a:„,y„)) := f{xi,yi) © .... © /(xn,y„)/. 

We remark that the “direct-sum” analogue of this conjecture is true: |BBCR10] proved that 
their direct sum result for /” can be easily extended to the computation of Z®"", showing (roughly) 
that D^n(/®”,3/4) = ■ D^(/, 2/3). However, this conversion technique does not apply to 

the direct product setting. 
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